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Abstract 

The high repair cost of (n, k) Maximum Distance Separable (MDS) erasure codes has recently 
motivated a new class of codes, called Regenerating Codes, that optimally trade off storage cost for repair 
bandwidth. On one end of this spectrum of Regenerating Codes are Minimum Storage Regenerating 
(MSR) codes that can match the minimum storage cost of MDS codes while also significantly reducing 
repair bandwidth. In this paper, we describe Exact-MSR codes which allow for any failed nodes 
(whether they are systematic or parity nodes) to be regenerated exactly rather than only functionally or 
information-equivalently. We show that Exact-MSR codes come with no loss of optimality with respect to 
random-network-coding based MSR codes (matching the cutset-based lower bound on repair bandwidth) 
for the cases of: (a) k/n < 1/2; and {b) fc < 3. Our constructive approach is based on interference 
alignment techniques, and, unlike the previous class of random-network-coding based approaches, we 
provide exphcit and deterministic coding schemes that require a finite-field size of at most 2{n — k). 

Index Terms 

Interference Alignment, Minimum Storage Regenerating (MSR) Codes, Repair Bandwidth 



I. Introduction 

In distributed storage systems, maximum distance separable (MDS) erasure codes are well- 
known coding schemes that can offer maximum reliability for a given storage overhead. For an 
(n, k) MDS code for storage, a source file of size bits is divided equally into k units (of 
size ^ bits each), and these k data units are expanded into n encoded units, and stored at n 
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nodes. The code guarantees that a user or Data Collector (DC) can reconstruct the source file by 
connecting to any arbitrary k nodes. In other words, any {n — k) node failures can be tolerated 
with a minimum storage cost of ^ at each of n nodes. While MDS codes are optimal in terms of 
reliability versus storage overhead, they come with a significant maintenance overhead when it 
comes to repairing failed encoded nodes to restore the MDS system-wide property. Specifically, 
consider failure of a single encoded node and the cost needed to restore this node. It can be 
shown that this repair incurs an aggregate cost of M. bits of information from k nodes. Since 
each encoded unit contains only ^ bits of information, this represents a /c-fold inefficiency with 
respect to the repair bandwidth. 

This challenge has motivated a new class of coding schemes, called Regenerating Codes [[II, 
[[3, which target the information-theoretic optimal tradeoff between storage cost and repair band- 
width. On one end of this spectrum of Regenerating Codes are Minimum Storage Regenerating 
(MSR) codes that can match the minimum storage cost of MDS codes while also significantly 
reducing repair bandwidth. As shown in [[T|, [[3, the fundamental tradeoff between bandwidth 
and storage depends on the number of nodes that are connected to repair a failed node, simply 
called the degee d where k < d < n — 1. The optimal tradeoff is characterized by 

[MM d \ 

where a and 7 denote the optimal storage cost and repair bandwidth, respectively for repairing 
a single failed node, while retaining the MDS-code property for the user. Note that this code 
requires the same minimal storage cost (of size ^) as that of conventional MDS codes, while 
substantially reducing repair bandwidth by a factor of ''^^^^'^^^ (e.g., for (n, k, d) = (31, 6, 30), 
there is a 5x bandwidth reduction). In this paper, without loss of generality, we normalize the 
repair-bandwidth-per-link (^) to be 1, making M = k{d — k + 1). One can partition a whole 
file into smaller chunks so that each has a size of k{d~k + 1)0. 

While MSR codes enjoy substantial benefits over MDS codes, they come with some limitations 
in construction. Specifically, the achievable schemes in [[T][, [[2l that meet the optimal tradeoff 
bound of ([B restore failed nodes in a functional manner only, using a random-network-coding 
based framework. This means that the replacement nodes maintain the MDS-code property (that 

'in practice, the order of a file size is of 10^ (Kb) ^ 10^ (Gb). Hence, it is reasonable to consider this arbitrary size of the 
chunk. 
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any k out of n nodes can allow for the data to be reconstructed) but do not exactly replicate the 
information content of the failed nodes. 

Mere functional repair can be limiting. First, in many applications of interest, there is a need to 
maintain the code in systematic form, i.e., where the user data in the form of k information units 
are exactly stored at k nodes and parity information (mixtures of k information units) are stored 
at the remaining [n — k) nodes. Secondly, under functional repair, additional overhead information 
needs to be exchanged for continually updating repairing-and-decoding rules whenever a failure 
occurs. This can significantly increase system overhead. A third problem is that the random- 
network-coding based solution of [1] can require a huge finite-field size, which can significantly 
increase the computational complexity of encoding-and-decodingj. Lastly, functional repair is 
undesirable in storage security applications in the face of eavesdroppers. In this case, information 
leakage occurs continually due to the dynamics of repairing-and-decoding rules that can be 
potentially observed by eavesdroppers Q. 

These drawbacks motivate the need for exact repair of failed nodes. This leads to the following 
question: is there a price for attaining the optimal tradeoff of ([U) with the extra constraint of 
exact repair? The work in flU sheds some light on this question: specifically, it was shown that 
under scalar linear codej^ when ^ > | + f , there is a price for exact repair. For large n, this 
case boils down to - > i, i.e., redundancy less than two. Now what about for - < ^? This paper 
resolves this open problem and shows that it is indeed possible to attain the optimal tradeoff of 
^} for the case of ^ < ^ (and d > 2k — I), while also guaranteeing exact repair. Furthermore, 
we show that for the special case of A; < 3, there is no price for exact repair, regardless of the 



value of n. The interesting special case in this class is the (5, 3) Exact-MSR code 



, which is not 



covered by the first case of - 



< 



2- 



Our achievable scheme builds on the concept of interference alignment, which was introduced 

^In El Dimakis-Godfrey-Wu-Wainwright-Ramchandran translated the regenerating-codes problem into a multicast communi- 
cation problem where random-network-coding-based schemes require a huge field size especially for large networks. In storage 
problems, the field size issue is further aggravated by the need to support a dynamically expanding network size due to the need 
for continual repair. 

^In scalar linear codes, symbols are not allowed to be split into arbitrarily small sub-symbols as with vector linear codes. 
This is equivalent to having large block-lengths in the classical setting. Under non-linear and vector linear codes, whether or 
not the optimal tradeoff can be achieved for this regime remains open. 

''independently, CuUina-Dimakis-Ho in |5| found (5,3) E-MSR codes defined over GF(3), based on a search algorithm. 
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in the context of wireless communication networks (H, |I71. The idea of interference alignment 
is to align multiple interference signals in a signal subspace whose dimension is smaller than the 
number of interferers. Specifically, consider the following setup where a decoder has to decode 
one desired signal which is linearly interfered with by two separate undesired signals. How 
many linear equations (relating to the number of channel uses) does the decoder need to recover 
its desired input signal? As the aggregate signal dimension spanned by desired and undesired 
signals is at most three, the decoder can naively recover its signal of interest with access to 
three linearly independent equations in the three unknown signals. However, as the decoder is 
interested in only one of the three signals, it can decode its desired unknown signal even if it 
has access to only two equations, provided the two undesired signals are judiciously aligned in 
a 1 -dimensional subspace. See 0, |[8l for details. 

We will show in the sequel how this concept relates intimately to our repair problem. At a 
high level, the connection comes from our repair problem involving recovery of a subset (related 
to the subspace spanned by a failed node) of the overall aggregate signal space (related to the 
entire user data dimension). There are, however, significant differences some beneficial and some 
detrimental. On the positive side, while in the wireless problem, the equations are provided by 
nature (in the form of channel gain coefficients), in our repair problem, the coefficients of the 
equations are man-made choices, representing a part of the overall design space. On the flip side, 
however, the MDS requirement of our storage code and the multiple failure configurations that 
need to be simultaneously addressed with a single code design generate multiple interference 
alignment constraints that need to be simultaneously satisfied. This is particularly acute for 
a large value of k, as the number of possible failure configurations increases with n (which 
increases with k). Finally, another difference comes from the finite-field constraint of our repair 
problem. 

We propose a common-eigenvector based conceptual framework (explained in Section ITVl) that 
covers all possible failure configurations. Based on this framework, we develop an interference 
alignment design technique for exact repair. We also propose another interference alignment 
scheme for a (5, 3) codecl, which in turn shows the optimality of the cutset bound ^ for the 
case A; < 3. As in im, our coding schemes are deterministic and require a field size of at most 

^The finite-field nature of the problem makes this challenging. 
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Fig. 1. Repair models for distributed storage systems. In exact repair, the failed nodes are exactly regenerated, thus restoring 
lost encoded fragments with their exact replicas. In functional repair, the requirement is relaxed: the newly generated node can 
contain different data from that of the failed node as long as the repaired system maintains the MDS-code property. In partially 
exact repair, only systematic nodes are repaired exactly, while parity nodes are repaired only functionally. 

2{n — k). This is in stark contrast to the random-network-coding based solutions [[T]|. 

II. Connection to Related Work 

As stated earlier, Regenerating Codes, which cover an entire spectrum of optimal tradeoffs 
between repair bandwidth and storage cost, were introduced in |[T], As discussed, MSR 
codes occupy one end of this spectrum corresponding to minimum storage. At the other end of 
the spectrum live Minimum Bandwidth Regenerating (MBR) codes corresponding to minimum 
repair bandwidth. The optimal tradeoffs described in [[T]|, |l2l are based on random-network- 
coding based approaches, which guarantee only functional repair. 

The topic of exact repair codes has received attention in the recent literature flU, [[TOll . S, jH, 
ifTTI . Wu and Dimakis in ^ showed that the MSR point ([T]) can be attained for the cases of: 
k = 2 and k = n — 1. Rashmi-Shah-Kumar-Ramchandran in [fTOl showed that for d = n — 1, the 
optimal MBR point can be achieved with a deterministic scheme requiring a small finite-field size 
and zero repair-coding-cost. Subsequently, Shah-Rashmi-Kumar-Ramchandran in H developed 
partially exact codes for the MSR point corresponding to ^ < | + ^, where exact repair is limited 
to the systematic component of the code. See Fig. \\\ Finding the fundamental limits under exact 
repair of all nodes (including parity) remained an open problem. A key contribution of this paper 
is to resolve this open problem by showing that E-MSR codes come with no extra cost over 
the optimal tradeoff of ^ for the case of ^ < ^ (and d >2k — 1). For the most general case. 
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finding the fundamental limits under exact repair constraints for all values of (n, k, d) remains 
an open problem. 

The constructive framework proposed in forms the inspiration for our proposed solution in 
this paper. Indeed, we show that the code introduced in [4] for exact repair of only the systematic 
nodes can also be used to repair the non-systematic (parity) node failures exactly provided repair 
construction schemes are appropriately designed. This design for ensuring exact repair of all 
nodes is challenging and had remained an open problem: resolving this for the case of ^ < | 
(and (i>2/c — l)isa key contribution of this work. Another contribution of our work is the 
systematic development of a generalized family of code structures (of which the code structure 
of flU is a special case), together with the associated optimal repair construction schemes. This 
generalized family of codes provides conceptual insights into the structure of solutions for the 
exact repair problem, while also opening up a much larger constructive design space of solutions. 

in. Interference Alignment for Distributed Storage Repair 

Linear network coding [fT2ll . lfT3l (that allows multiple messages to be linearly combined at 
network nodes) has been established recently as a useful tool for addressing interference issues 
even in wireline networks where all the communication links are orthogonal and non-interfering. 
This attribute was first observed in [9], where it was shown that interference alignment could 
be exploited for storage networks, specifically for minimum storage regenerating (MSR) codes 
having small k (k = 2). However, generalizing interference alignment to large values of k (even 
k = 3) proves to be challenging, as we describe in the sequel. In order to appreciate this better, 
let us first review the scheme of f9] that was applied to the exact repair problem. We will then 
address the difficulty of extending interference alignment for larger systems and describe how 
to address this in Section HVl 

A. Review of (4, 2) E-MSR Codes M 

Fig. [2] illustrates an interference alignment scheme for a (4, 2) MDS code defined over GF(5). 
First one can easily check the MDS property of the code, i.e., all the source files can be 
reconstructed from any k{= 2) nodes out of n(= 4) nodes. Let us see how failed node 1 
(storing (ai,a2)) can be exactly repaired. We assume that the degree d (the number of storage 
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A source file 



ai 



a2 



b2 



Encoded packets 
node 1 



node 2 



node 3 

(parity node 1) 



node 4 

(parity node 2) 



How to repair? 



Interference alignment 




Fig. 2. Interference alignment for a (4,2) E-MSR code defined over GF(5) (9)- Choosing appropriate projection weights, we 
can aUgn interference space of (61,62) into one-dimensional linear space spanned by [1, 1]*. As a result, we can successfully 
decode 2 desired unknowns (ai,a2) from 3 equations containing 4 unknowns (01,02,61,62). 



nodes connected to repair a failed node) is 3, and a source file size Ai is 4. The cutset bound ([U) 
then gives the fundamental limits of: storage cost a = 2; and repair-bandwidth-per-link ^ = 1- 
The example illustrated in Fig. [2] shows that the parameter set described above is achievable 
using interference alignment. Here is a summary of the scheme. First notice that since the 
bandwidth-per-link is 1, two symbols in each storage node are projected into a scalar variable 
with projection weights. Choosing appropriate weights, we get the equations as shown in Fig. 
m (^1 + ^2); 0-1 + 2a2 + (^1 + ^2); 2ai + 02 + (61 + &2)- Observe that the undesired signals (61, 62) 
(interference) are aligned onto an 1 -dimensional linear subspace, thereby achieving interference 
alignment. Therefore, we can successfully decode (01,02) with three equations although there 
are four unknowns. Similarly, we can repair (61, ^2) when it has failed. 

B. Matrix Notation 

We introduce matrix notation that provides geometric interpretation of interference alignment 
and is useful for generalization. Let a = (ai, 02)* and b = (61, 62)* be 2-dimensional information- 
unit vectors, where (■)* indicates a transpose. Let Aj and Bj be 2-by-2 encoding matrices for 
parity node i (i = 1,2), which contain encoding coefficients for the linear combination of the 
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Fig. 3. Geometric interpretation of interference alignment. The blue solid-line and red dashed-line vectors indicate linear 
subspaces with respect to "a" and "b", respectively. The choice of Vq2 = B^^v^i and v^s — Bj^^Vd enables interference 
alignment. For the specific example of Fig. |2] the corresponding encoding matrices are Ai — [1,0; 0,2], Bi — [1,0; 0,1]. 
A2 = [2,0; 0,1], Ba = [1,0; 0, 1]. 



components of "a" and "b". For example, parity node 1 stores information in the form of 
a*Ai + b*Bi, as shown in Fig. [3l The encoding matrices for systematic nodes are not explicitly 
defined since those are trivially inferred. Finally we define 2-dimensional projection vectors v^j's 
(z = l,2,3). 

Let us consider exact repair of systematic node 1. By connecting to three nodes, we get: 
b*VQi; a*(AiVQ2) + b*(BiVa2); a*(A2Va3) + b*(B2Va3). Recall the goal, which is to decode 2 
desired unknowns out of 3 equations including 4 unknowns. To achieve this goal, we need: 



rank 





(AiV,2)* 


j = 2; rank 


( 






(A2V„3)* 





VL 



'al 



(BiV„2)* 
(B2V,3)* 



\ 



1. 



(2) 



The second condition can be met by setting Vq,2 = B^^ ^v^i and Vq 



Bo^Vqi. This choice 



forces the interference space to be collapsed into a one-dimensional linear subspace, thereby 
achieving interference alignment. With this setting, the first condition now becomes 

rank ( [AiB^^v^i A^B^'y^,] ) = 2. (3) 

It can be easily verified that the choice of Aj's and Bj's given in Figs. |2] and [3] guarantees the 
above condition. When the node 2 fails, we get a similar condition: 

rank([BiArV B2A2-V]) =2, (4) 
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where v^/s denote projection vectors for node 2 repair. This condition also holds under the 
given choice of encoding matrices. 

C. Connection with Interference Channels in Communication Problems 
Observe the three equations shown in Fig. |3l 








V* 


(AiV„2)* 


a + 


(BiV„2)* 


(A2V,3)* 




(B2V,3)* 



desired signals interference 

Separating into two parts, we can view this problem as a wireless communication problem, 
wherein a subset of the information is desired to be decoded in the presence of interference. 
Note that for each term (e.g., AiVq,2), the matrix Ai and vector Va2 correspond to channel 
matrix and transmission vector in wireless communication problems, respectively. 

There are, however, significant differences. In the wireless communication problem, the chan- 
nel matrices are provided by nature and therefore not controllable. The transmission strategy 
alone (vector variables) can be controlled for achieving interference alignment. On the other 
hand, in our storage repair problems, both matrices and vectors are controllable, i.e., projection 
vectors and encoding matrices can be arbitrarily designed, resulting in more flexibility. However, 
our storage repair problem comes with unparalleled challenges due to the MDS requirement and 
the multiple failure configurations. These induce multiple interference alignment constraints that 
need to be simultaneously satisfied. What makes this difficult is that the encoding matrices, once 
designed, must be the same for all repair configurations. This is particularly acute for large 
values of k (even k = 3), as the number of possible failure configurations increases with n 
(which increases with k). 

IV. A Proposed Framework for Exact-MSR Codes 

We propose a common-eigenvector based conceptual framework to address the exact repair 
problem. This framework draws its inspiration from the work in [4] which guarantees the exact 
repair of systematic nodes, while satisfying the MDS code property, but which does not provide 
exact repair of failed parity nodes. In providing a solution for the exact repair of all nodes, 
we propose here a generalized family of codes (of which the code in H is a special case). 
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Fig. 4. Difficulty of achieving interference alignment simultaneously. 



This both provides insights into the structure of codes for exact repair of all nodes, as well 
as opens up a much larger design space for constructive solutions. Specifically, we propose a 
common-eigenvector based approach building on a certain elementary matrix property [fT4l|. ifTSl 
for the generalized code construction. Moreover, as in [4J, our proposed coding schemes are 
deterministic and constructive, requiring a symbol alphabet-size of at most (2n — 2k). 

Our framework consists of four components: (1) developing a family of code^ for exact repair 
of systematic codes based on the common-eigenvector concept; (2) drawing a dual relationship 
between the systematic and parity node repair; (3) guaranteeing the MDS property of the code; 
(4) constructing codes with finite-field alphabets. The framework covers the case of n > 2k (and 
d > 2k — 1). It turns out that the {2k, k, 2k — 1) code case contains the key design ingredients 
and the case of n > 2k can be derived from this (see Section IVll) . Hence, we first focus on 
the simplest example: (6, 3, 5) E-MSR codes. Later in Section |Vl we will generalize this to 
arbitrary (n, k, d) codes in the class. 

^Interestingly, the structure of the code in ID turns out to work for the exact repair of both systematic and parity nodes 
provided appropriate repair schemes are developed. 
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A. Code Structure for Systematic Node Repair 

For A; > 3 (more-than-two interfering information units), achieving interference alignment for 
exact repair turns out to be significantly more complex than the k = 2 case. Fig. |4] illustrates this 
difficulty through the example of repairing node 1 for a (6, 3, 5) code. By the optimal tradeoff 
([U), the choice of = 9 gives a = 3 and ^ = 1- Let a = (01,02,03)*, b = (&i,&27^3)* and 
c = (ci,C2,C3)*. We define 3-by-3 encoding matrices of Aj, Bj and Cj (for i = 1,2,3); and 
3-dimensional projection vectors Vq-j's. 

Consider the 5 {= d) equations downloaded from the nodes: 



























(AiV„3)* 


a + 


(BiV,3)* 


b + 


(CiV,3)* 


(A2V„4)* 




(B2V,4)* 




(C2V„4)* 


_ (AgV^s)* _ 




. (BgV^s)* . 




. (C3V«5)* . 



In order to successfully recover the desired signal components of "a", the matrices associated 
with b and c should have rank 1, respectively, while the matrix associated with a should have full 
rank of 3. In accordance with the (4, 2) code example in Fig. [3l if one were to set Vq,3 = Bj^^Vq,i, 
Vq,4 = B2 ^Vq,2 and Vq,5 = Bj^v^i, then it is possible to achieve interference alignment with 
respect to b. However, this choice also specifies the interference space of c. If the Bj's and 
Cj's are not designed judiciously, interference alignment is not guaranteed for c. Hence, it is 
not evident how to achieve interference alignment at the same time. 

In order to address the challenge of simultaneous interference alignment, we invoke a common 
eigenvector concept. The idea consists of two parts: (i) designing the (Aj, Bj, Cj)'s such that vi 
is a common eigenvector of the Bj's and Cj's, but not of A/ j^; (ii) repairing by having survivor 
nodes project their data onto a linear subspace spanned by this common eigenvector vi. We 
can then achieve interference alignment for b and c at the same time, by setting v^j = vi,Vz. 
As long as [AiVi, A2V1, A3V1] is invertible, we can also guarantee the decodability of a. See 
Fig. El 

^Of course, five additional constraints also need to be satisfied for the other five failure configurations for this (6, 3, 5) code 
example. 
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node 1 
node 2 
node 3 



(p^,g.tdfl) n^+b'Bi+c'Ci 
,„^odeJ,, pA; + b'B2 + c'C2 

(paB84§de63) |a'A3+b'B3 + c'C3 



Aivi 




A3V1 



A2V1 



Vl 



Vl 



Vl 



Vl 


















vi 




(AlVi)* 
(A2V1)* 
(A3V1)* 


a + 


(Bivi)* 
(B2V1)* 
(B3V1)* 


b + 


(Civi)' 
(C2V1)' 
(C3V1)* 



Goal: rank=3 rank=1 rank=1 



^ ^ ^ ^ 



Vl 



Vl 



Idoa: (^) Design Aj's, Bj's and Cj's s.t. vi is a common eigenvector of 
the Bi's and C^'s, but not of the A^'s. 

(ii) Repair by having survivor nodes project their data onto a linear 
subspace spanned by this common eigenvector vi . 



Fig. 5. Illustration of exact repair of systematic node 1 for (6, 3, 5) E-MSR codes. The idea consists of two parts: (i) designing 
(Ai,Bi,Ci)'s such that vi is a common eigenvector of the B,;'s and Ci's, but not of A^'s; (ii) repairing by having survivor 
nodes project their data onto a linear subspace spanned by this common eigenvector vi. 



The challenge is now to design encoding matrices to guarantee the existence of a common 
eigenvector while also satisfying the decodability of desired signals. The difficulty comes from 
the fact that in our (6, 3, 5) code example, these constraints need to be satisfied for all six 
possible failure configurations. The structure of elementary matrices [fT4ll . ifTSl (generalized 
matrices of Householder and Gauss matrices) gives insights into this. To see this, consider a 
3-by-3 elementary matrix A: 

A = uv* + al, (5) 

where u and v are 3-dimensional vectors. Here is an observation that motivates our proposed 
structure: the dimension of the null space of v is 2 and the null vector v"*- is an eigenvector of 
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(6) 



A, i.e., Av^ = av^. This motivates the following structure: 

Ai = uiv* + Bi = U1V2 + /3il; Ci = U1V3 + 71I 

A2 = U2V* + 021; B2 = U2V2 + /32I; C2 = U2V3 + 72I 

A3 = U3V* + asl; B3 = U3V2 + (3-iI; C3 = U3V3 + 73I, 
where v/s are 3-dimensional linearly independent vectors and so are Uj's. The values of the 
a/s, /3j's and 7i's can be arbitrary non-zero values. First consider the simple case where the Vj's 
are orthonormal. This is for conceptual simplicity. Later we will generalize to the case where 
the Vi's need not be orthogonal but only linearly independent: namely, bi-orthogonal case. For 
the orthogonal case, we see that for i = 1, 2, 3, 

AjVi = ttjVi + Ui, 

B.vi = Avi, (7) 
CjVi = 7iVi. 

Importantly, notice that vi is a common eigenvector of the Bj's and Cj's, while simultaneously 
ensuring that the vectors of AjVi are linearly independent. Hence, setting v^j = vi for all i, it is 
possible to achieve simultaneous interference alignment while also guaranteeing the decodability 
of the desired signals. See Fig. [51 On the other hand, this structure also guarantees exact repair 
for b and c. We use V2 for exact repair of b. It is a common eigenvector of the C,;'s and Aj's, 
while ensuring [B1V2, B2V2, B3V2] invertible. Similarly, V3 is used for c. 

We will see that a dual basis property gives insights into the general bi-orthogonal case where 
{v} := (vi, V2, V3) is not orthogonal but linearly independent. In this case, defining a dual basis 
{v'} := (v'^,V2, V3) gives the solution: 



Vl 



V2 



V3 



The definition gives the following property: vfvj = S{i — Using this property, one can 

see that is a common eigenvector of the B/s and Cj's: 

Ai\[ = ai\[ + Ui, 

B.v'i = Av'i, (8) 

CI I 
= 7iVi. 
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So it can be used as a projection vector for exact repair of a. Similarly, we can use V2 and Vg 
for exact repair of b and c, respectively. 



B. Dual Relationship between Systematic and Parity Node Repair 

We have seen so far how to ensure exact repair of the systematic nodes. We have known 
that if {v} is linearly independent and so {u} is, then using the code- structure of ^ together 
with projection direction enables repair, for arbitrary values of (oj, 7j)'s. A natural question 
is now: will this code structure also guarantee exact repair of parity nodes? It turns out that 
for exact repair of all nodes, we need a special relationship between {v} and {u} through the 
correct choice of the (oj, 7j)'s. 

We will show that parity nodes can be repaired by drawing a dual relationship with systematic 
nodes. The procedure has two steps. The first is to remap parity nodes with a', b', and c', 
respectively: 



a' 




K 








a 


b' 












b 


c' 






B^ 


CI 




c 



Systematic nodes can then be rewritten in terms of the prime notations: 

a* = a'*A; + b'*B; + c'*c;, 

b* = a'*A'2 + b'*B'2 + c'*C^, 

c* = a'*A'3 + b'*B'3 + c'^C;,, 

where the newly mapped encoding matrices (A-, B^, Cj)'s are defined as: 

1 

Ai A2 A3 



(9) 



A'l A2 A3 
B'l B2 B3 
C'l C2 C3 



Bi B2 B3 
Ci C2 C3 



(10) 



With this remapping, one can dualize the relationship between systematic and parity node repair. 
Specifically, if all of the A-'s, B^'s, and C-'s are elementary matrices and form a similar code- 
structure as in exact repair of the parity nodes becomes transparent. 

The challenge is now how to guarantee the dual structure. In Lemma [H we show that a special 
relationship between {u} and {v} through (Oj, 7j)'s can guarantee this dual relationship of 
(fni). 
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a"A'-i + b'^B'i + c'^C[\ 






1 b' 1 


a"A^ + b'*B^+c"C^ 


1 c' 1 " 











ui 


















(A'lUi)* 
(A'^ui)* 
1 (AWi)* 


a' + 


(B'lUi)* 
(B'sui)' 


b' + 


(C'lUi)* 
(C^ui)* 



Goal: rank=3 r^nk=1 rank=1 



a;u\ 




A3U1 



Ul 



A2U1 



We provide sufficient conditions to ensure the dual structure: 





ai 


0:2 


Ct3 


k[ui,U2,U3] = [Vi,V2,V3] 










71 


72 


73 



ai a2 cxs 

Pi P2 P3 

7i 72 73 



is invert ible. 



Fig. 6. Exact repair of a parity node for (6, 3) E-MSR code. The idea is to construct the dual code- structure of J13l > by 
remapping parity nodes and then adding sufficient conditions of dUb and l ll2b . 



Lemma 1: Suppose 





ai 


a2 


as 


M := 


/3i 








71 


72 


73 



is invertible. 



(11) 



Also assume 

kV = V'M. (12) 
where U = [ui,U2,U3], V = [v'i,V2,V3], {v'} := {v'^,V2,V3} is the dual basis of {v}, i.e., 



vfvj = 6{i — i) and n is an arbitrary non-zero value s.t. 1 — 7^ 0. Then, we can obtain the 
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dual structure of © as follows: 



a; = 


1 


1 




— K^a'^l) 


;b; = 


1 








1 




- K^a'^l) 


K = 


1 


1 






B2 = 


1 


^ - 




C2 = 


1 






K = 


1 


1 






B3 = 


1 






C'3 = 


1 


^ ("3U3* 





(13) 



where {u'} is the dual basis of {u}, i.e., ufuj = 5{i — j) and (a^, /?•, 7-)'s are the dual basis 
vectors, i.e., < [a-, {aj,Pj,-fj) >= 5{i - j): 



n -1 



a'^ [: 








^2 


"3 








/3i 


/32 


/33 




*3 7^ 




7i 


72 


73 



Proof: See Appendix lAl ■ 
Remark 1: The dual structure of (fT3l) now gives exact-repair solutions for parity nodes. For 
exact repair of parity node 1, we can use vector ui (a common eigenvector of the B^'s and 
C^'s), since it enables simultaneous interference alignment for b' and c', while ensuring the 
decodability of a'. See Fig. [6l Notice that more conditions of (fTTI) and (fT2l) are added to ensure 
exact repair of all nodes, while these conditions were unnecessary for exact repair of systematic 
nodes only. Also note these are only sufficient conditions. 

Remark 2: Note that the dual structure of (fT3l) is quite similar to the primary structure of 
The only difference is that in the dual structure, {u} and {v} are interchanged to form a 
transpose-like structure. This reveals insights into how to guarantee exact repair of parity nodes 
in a transparent manner. 

C. The MDS-Code Property 

The third part of the framework is to guarantee the MDS-code property, which allows us to 
identify specific constraints on the (ctj, 7j)'s and/or ({v}, {u}). Consider four cases, associated 
in the Data Collector (DC) who is intended in the source file data: (1) 3 systematic nodes; (2) 
3 parity nodes; (3) 1 systematic and 2 parity nodes; (4) 1 systematic and 2 parity nodes. 

The first is a trivial case. The second case has been already verified in the process of forming 
the dual code- structure of (fT3l) . The invertibility condition of (fTT)) together with (fT2)) suffices to 
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ensure the invertibility of the composite matrix. The third case requires the invertibility of all 
of each encoding matrix. In this case, it is necessary that the a/s, /3j's and 7j's are non-zero 
values; otherwise, each encoding matrix has rank 1. Also the non-zero values together with (fT2l) 
guarantee the invertibility of each encoding matrix. Under these conditions, for example, the 
inverse of Ai is well defined as: 

1 / 1 



ai + v^ui 



UiVi 



I 



— ( rUiV* + I I . 

ai \ ai{K + 1) / 



where the second equality follows from \\ui = ^ due to (fT2l) . 

The last case requires some non-trivial work. Consider a specific example where the DC 
connects to nodes (3,4,5). In this case, we first recover c from node 3 and subtract the terms 
associated with c from nodes 4 and 5. We then get: 



Ai Aa 
Bi B2 



UiV* + ttil U2V* + 
UiV^ + U2V^ + f32l 



(15) 



Using a Gaussian elimination method, we show that the sub-composite matrix is invertible if 



M2 := 




^2 




/3i 





Here is the Gaussian elimination method: 



UiV^ + ttil U2V1 + 02! 

uiv^ + U2V^ + /32I 



(a) 



"lUs 

alu'3 



is invertible. 



a2u'/ 



(16) 



V*i + a2U2 



P2< 



uv 



0* 



uv 



0* 



0* 



a' V* + V* + u'^ 



0* 



where (a) following from multiplying [u'/, 0*; 0*, u'/; U2 , 0*; 0*, ; Ug, 0*; 0*, Ug] to the left; (6) 
follows from multiplying [M2^\ 0, 0; 0, M2 \ 0; 0, 0, Ms^^] to the left. Here (a^,/3-)'s are the 
dual basis vectors of (a;j,/3i)'s. Note that the resulting matrix is invertible, since {u'} is a dual 
basis. 
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Considering the above 4 cases, the following condition together with (fTTl) and (fT2l) suffices 
for guaranteeing the MDS-code property: 

Any submatrix o/M of f[77]) is invertible. (17) 

D. CoJe Construction with Finite-Field Alphabets 

The last part is to design M of (fTTl) and {v} := (vi, V2, V3) in ^ such that {v} is linearly 
independent and the conditions of (fT2l) and (fTTl) are satisfied. First, in order to guarantee (fTTl) , 
we can use a Cauchy matrix, as it was used for the code introduced in 01. 

Definition 1 (A Cauchy Matrix /fl6l/).- A Cauchy matrix M is an m x matrix with entries 
rriij in the form: 

ruij = — ^ — ,Wi = !,••• ,m,j = l,---n,Xi^yj, 

Xi t/j 

where Xj and yj are elements of a field and {xi} and {yj} are injective sequences, i.e., elements 
of the sequence are distinct. 

The injective property of {xi} and {yj} requires a finite field size of 2s for an s x s Cauchy 
matrix. Therefore, in our (6, 3, 5) code example, the finite field size of 6 suffices. The field size 
condition for guaranteeing linear independence of {v} is more relaxed. 

E. Summary 

Using the code structure of © and the conditions of (fTTl) . (fT2l) and (fTTl) . we can now state 
the following theorem. 

Theorem 1 ((6, 3, 5) E-MSR Codes): Suppose M of (fTTj) is a Cauchy matrix, i.e., every sub- 
matrix of is invertible. Each element of M is in GF(g) and q > 6. Suppose encoding matrices 
form the code structure of (f6|), {v} := (vi, V2, V3) is linearly independent, and {u} satisfies the 
condition of (fT2|) . Then, the code satisfies the MDS property and achieves the MSR point under 
exact repair constraints of all nodes. 

Remark 3: Note that the code introduced in [4J is a special case of Theorem fT} where V = 

[Vl, V2, V3] = I. 
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V. Examples 

We provide two numerical examples: (1) an orthogonal code example where V = [vi, V2, V3] 
is orthogonal, e.g., V = I; (2) an bi-orthogonal code example where V is not orthogonal but 
invertible. As mentioned earlier, the code in BU belongs to the case of V = I. 

We will also discuss the complexity of repair construction schemes for each of these examples. 
It turns out that the first code has significantly lower complexity for exact repair of systematic 
nodes, as compared to that of parity nodes. On the other hand, for the second bi-orthogonal codes, 
the specific choice of V = k^^M* gives U = I, thereby providing much simpler parity-node 
repair schemes instead. Depending on applications of interest, one can choose an appropriate 
code among our generalized family of codes. 

A. Orthogonal Case 

We present an example of (6, 3, 5) E-MSR codes defined over GF(4) where V = I and 
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, U = K^^Y'M = 2 
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where U is set based on (fT2l) and k = 2^^. We use a generator polynomial of g(x) = x'^ + x + 1. 
Notice that we employ a non-Cauchy-type matrix to construct a field-size 4 code (smaller than 
6 required when using a Cauchy matrix). Remember that a Cauchy matrix provides only a 
sufficient condition for ensuring the invertibility of any submatrices of M. By Q and (fT3l) . the 
primary and dual code structures are given by 
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2a[ + 2b[ + 2c[ 








a'l + + 36'i + 3&2 + 2c'^ + 3c2 




—1— 




a[ +3a;j +26'i +36;, + 3c'i + 3^ 









3al 


+ 13 


+ 63 +2c'i 


+ 4 






- 62 + 363 


+ 2c^ + 24 




213 


+ 363 





3ai + 02 + 26i + 62 + 4 + c'2 








2a'2 +62 + 3c2 












a'-y + 3a^^ + 269 + 26^1 + 3cf, + c'^ 









2a[ + Sa'2 + Saj 
+3b[ + 362 + 363 
+3c'i + 3c'2 + 3c^ 

3a^ + 2a'2 + Sag 
+26'i + 2&2 + 263 

+c'i +4 + 4 
3a[ + 3a2 + 203 

+6i + 4 + 4 
+24 + 2c;, + 24 



3ai + 2^2 + 2a3 + bi 


+ ci 1 


a2 + 2bi + 362 + 263 + C2 


as 


+ 63 + 2ci + 2c2 + 3c3 




3ai + 3a2 + as + 2&i 


+ 3ci 1 


Q2 + 26i 


+ 62 + 63 + 3C2 




+ 263 + 2ci + 3C2 + 2C3 




3ai + 02 + 3a3 + 3fci 


+ 2ci 1 


Q2 + 261 + 


262 + 363 + 2C2 


as 


+ 363 + 2ci + C2 + IC3 




(a) Exact repair of systematic node 1 (b) Exact repair of parity node 1 



Fig. 7. Orthogonal case: Illustration of exact pair for a (6, 3, 5) E-MSR code defined over GF(4) where a generator polynomial 
g{x) — x'^ + X + 1. The projection vector solution for systematic node repair is quite simple: Vai = vi = (1, 0, 0)*, Vi. We 
download only the first equation from each survivor node; For parity node repair, our new framework provides a simple scheme: 
setting all of the projection vectors as 2~^ui = (1, 1, 1)*. This enables simultaneous interference alignment, while guaranteeing 
the decodability of a. 



where 





Ai 


A2 


As 




a; 


A'2 


A'a 


G : = 


Bi 


B2 


Bs 




b; 


B'2 


B's 




Ci 


C2 


Ca 




c; 







Fig. |7] shows an example for exact repair of (a) systematic node 1 and (b) parity node 1. 
Note that the projection vector solution for systematic node repair is quite simple: Vai = vi = 
(1,0,0)*,V2. We download only the first equation from each survivor node. Notice that the 
downloaded five equations contain only five unknown variables of (ai, 02, cia, &i, Ci) and three 
equations associated with a are linearly independent. Hence, we can successfully recover a. 

On the other hand, exact repair of parity nodes seems non- straightforward. However, our 
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framework provides quite a simple repair scheme: setting all of the projection vectors as 2^^ui = 
(1, 1, 1)*. This enables simultaneous interference alignment, while guaranteeing the decodability 
of a. Notice that {b[, h^-, h'.^) and {d^, 4, C3) are aligned into h\^-h'2+h'^ and respectively, 
while three equations associated with a' are linearly independent. 

As one can see, the complexity of systematic node repair is a little bit lower than that of 
parity node repair, although both repair schemes are simple. Hence, one can expect that this 
orthogonal code is useful for the applications where the complexity of systematic node repair 
needs to be significantly low. 



B. Bi-Orthogonal Case 

We provide another example of (6, 3, 5) E-MSR codes where V is not orthogonal but invertible. 
We use the same field size of 4, the same generator polynomial and the same M. Instead we 
choose non-orthogonal V so that the complexity of parity node repair can be significantly low. 
Our framework provides a concrete guideline for designing this type of code. Remember that the 
projection vector solutions are ui, U2 and U3 for exact repair of each parity node, respectively. 
For low complexity, we can first set U = I. The condition (fT2l) then gives the following choice: 

2 2 2 
2 3 1 
2 1 3 

where we use k = 2~^. By Q and (fT3l) . the primary and dual code structures are given by 
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(19) 
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as 



ci 



C2 



C3 



61 + 62 + &3 



Cl + C2 + C3 



3ai + 36i + 3ci 








2qi + fla + 361 + &2 + Cl + C2 






2ai + 03 +61 + 63 + 3ci + C3 









Oi + 2a2 + 2bi + 262 + 3ci + 2c2 








3a2 +62 + 2c2 














2q2 +0.3 + 62 + 263 + 3C2 + 3C3 









ai +2a3 +36i +263 + 2ci + 2c3 








02 + 2a3 + 3b2 + 363 + 2c2 + C3 








3a3 + 263 + C3 









3ai + a2 + as 
+bi + 62 + 63 

+Cl +C2+C3 

ai + 3a2 + Q3 
+2bi + 2fe2 + 2b3 
+3ci + 3c2 + 3c3 

ai + 02 + 3a3 
+3&1 + 3fe2 + 3&3 
+2ci + 2c2 + 2c3 



2_ 



6'. 



2a'i + a2 + 03 + 36^ 
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Sa^j + h\ + 


26'2 + 6!, +34 


3a', 


+ 3&i + c'l + 4 + 2c'3 




2a{ + So!, + 203 + 2b[ 


+ c[ 


3a2 + b'l - 


h 62 + 2^3 + c'2 


3a' 


+ 26', + c', + 3cL + 2c!. 




2a'i + 202 + '^'A ~ ^''i 


+ 2c'i 1 


3a2 + b\ - 


f 36;^ + 36;, + 2c;, 


3a3 


+ 63 + c', + 2ci + cf, 



(a) Exact repair of systematic node 1 



(b) Exact repair of parity node 1 



Fig. 8. Bi-Orthogonal case: Illustration of exact repair for a (6,3,5) E-MSR code defined over GF(4) where a generator 
polynomial g{x) = x'^ + x + 1. We use U = I. For parity node repair, the solution for projection vectors is much simpler. We 
download only the first equation from each survivor node; Systematic node repair is a bit involved: setting all of the projection 
vectors as 2"^vi = (1, 1, 1)*. 



Notice that the matrices of (fT9l) have exactly the transpose structure of the matrices of (fTSi) . 
Hence, this code of (fT9l ) is a dual solution of (fTSi) . thereby providing switched projection vector 
solutions and lowering the complexity for parity node repair. 

Fig. [8] shows an example for exact repair of (a) systematic node 1 and (b) parity node 1. 
Reverse to the previous case, exact repair of parity nodes is now much simpler. In this example, 
by downloading only the first equation from each survivor node, we can successfully recover 
a'. On the contrary, systematic node repair is a bit involved: a projection vector solution is 
2^^vi = (1, 1, 1)*. Using this vector, we can achieve simultaneous interference alignment, thereby 
decoding the desired components of a'. 
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VI. Generalization: n>2k;d>2k-l 

Theorem [H gives insights into generalization to {2k, k,k — l) E-MSR codes. The key observa- 
tion is that assuming Ai = k(d—k+l), storage cost is a = Ai/k = d—k+1 = k and this number 
is equal to the number of systematic nodes and furthermore matches the number of parity nodes. 
Notice that the storage size matches the size of encoding matrices, which determines the number 
of linearly independent vectors of {v} := {vi, • ■ ■ }. In this case, therefore, we can generate k 
linearly independent vectors {v} := {vi, ■ ■ ■ ,Vfe} and corresponding {u} := {ui, ■ ■ ■ , u/j} 
through the appropriate choice of M to design {2k, k,k — 1) E-MSR codes. 

A. Case: n = 2k 

Theorem 2 ({2k, k,k — 1) E-MSR Codes): Let M be a Cauchy matrix: 



M 



m 



m. 



(1) 
1 

(1) 



m 



m 



(2) 
1 

(2) 



m 



m 



(k) 
1 

(fc) 



(1) (2) (fe) 

ml' ml' ■ ■ ■ ml' 
where each element m^j^ G GF(g), where q > 2k. Suppose 

V = [vi, ■ ■ ■ , Vfc] is invertible and 

U = K-^V'M, 

where V = (V*)^^ and k is an arbitrary non-zero value G Fg such that 1 — 7^ 0. Also assume 
that encoding matrices are given by 



(20) 



G 



(1) 



) = Ufcvl + m^l, ■ ■ ■ , G^^ = Ufcv* + m^I 



(k) 



uivt + m[ ^I, 



(21) 



where G^*'' indicates an encoding matrix for parity node i, associated with information unit 
/. Then, the code satisfies the MDS property and achieves the MSR point under exact repair 
constraints of all nodes. 

Proof: See Appendix |Bl ■ 
Remark 4: Note that the minimum required alphabet size is 2k. As mentioned earlier, this 
is because we employ a Cauchy matrix for ensuring the invertibility of any submatrices of M. 
One may customize codes to find smaller alphabet-size codes. 
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B. Case: n>2k;d>2k-l 

Now what if k is less than the size (= a = d — k + 1) of encoding matrices, i.e., d > 2k — 11 
Note that this case automatically implies that n > 2k, since n > d + 1. The key observation in 
this case is that the encoding matrix size is bigger than k, and therefore we have more degrees 
of freedom (a larger number of linearly independent vectors) than the number of constraints. 
Hence, exact repair of systematic nodes becomes transparent. This was observed as well in [4], 
where it was shown that for this regime, exact repair of systematic nodes only can be guaranteed 
by judiciously manipulating (2k, k,k — 1) codes through a puncturing operation. 

We show that the puncturing technique in [4J (meant for exact repair of systematic nodes and 
for a special case of our generalized codes) together with our repair construction schemes can 
also carry over to ensure exact repair of all nodes even for the generalized family of codes. The 
recipe for this has two parts: 

1. Constructing a target code from a larger code through the puncturing technique. 

2. Showing that the resulting target code indeed ensures exact repair of all nodes as well as 
the MDS-code property for our generalized family of codes. 

The first part contains the following detailed steps: 
1(a) Using Theorem [21 construct a larger (2n — 2k, n — k,2n — 2k — 1) code with a finite field 
size of g > 2n — 2k. 

1(6) Remove all the elements associated with the [n — 2k) information units (e.g., from the 
[k + l)th to the [n — k)ih. information unit). The number of nodes is then reduced by 
[n — 2k) and so are the number of information units and the number of degrees. Hence, 
we obtain the {n, k,n~ 1) code. 
1(c) Prune the last (n — 1 — d) equations in each storage node and also the last {n — 1 — d) 
symbols of each information unit, while keeping the number of information units and 
storage nodes. We can then get the (n, k, d) target code. 
Indeed, based on our framework in Section |IVl it can be shown that the resulting punctured code 
described above guarantees exact repair of all nodes and MDS-code property for our generalized 
family of codes. Hence, we obtain the following theorem. The proof procedure is tedious and 
mimics that of Theorem [2l Therefore, details are omitted. 

Theorem 3 < \, d > 2k — 1): Under exact repair constraints of all nodes, the optimal trade- 
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3ai 



2ai + 02 



Cl 




(6,3,5) E-MSR code 

Remove (ci, 02,03), (03,63) 
and associated elements. 
Also remove the third equation 
of each storage node. 



Ql 
a2 



(5,2,3) 
E-MSR code 



+ 36i 



+ 36i + 62 



I \\ I yT 




3oi + 36i 
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+ 26i + 262 + 3:-V+ 2. 
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+ 3>+ 3c; 



oi +2 


23 +35i +; 




02 + 2 


13 + 362 + : 




3i 


I3 +2 





Oi + 202 ^ 


-26i 


+ 252 


302 




+ 62 




Ol ^ 
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Fig. 9. Bi-Orthogonal case: Illustration of the construction of a (5, 2, 3) E-MSR code from a (6, 3, 5) code defined over GF(4). 
For a larger code, we adopt the (6, 3, 5) code in Fig. [8] First, we remove all the elements associated with the last (n — 2fc) = 1 
information unit ("c"). Next, we prune symbols (03,63) and associated elements. Also we remove the last equation of each 
storage node. Finally we obtain the (n, fc, d) = (5, 2, 3) target code. 



off of ([B can be attained with a deterministic scheme requiring a field size of at most 2(n — k). 

Example 1: Fig. [9] illustrates how to construct an (ra, k, d) = (5, 2, 3) target code based on the 
above recipe. First construct the {2n — 2k, n — k,2n — 2k — 1) = (6, 3, 5) code, which is larger 
than the (5, 2, 3) target code, but which belongs to the category of n = 2k. For this code, we 
adopt the bi-orthogonal case example in Fig. [8l For this code, we now remove all the elements 
associated with the last (n — 2k) = 1 information unit, which corresponds to (01,02,03). Next, 
prune the last symbol (03, 63) of each information unit and associated elements to shrink the 
storage size into 2. We can then obtain the (5, 2, 3) target code. Exact repair and the MDS-code 
property of the resulting code can be verified based on the proposed framework in Section ITVl 
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VII. Generalization: A; < 3 

As a side generalization, we consider the case of A; < 3. The interesting special case of the 
(5,3) E-MSR codcl will be focused on, since it is not covered by the above case of ^ < |. 
For this case, we propose another interference alignment technique building on an eigenvector 
concept. 

Theorem 4 (k < 3): The MSR point can be attained with a deterministic scheme requiring a 
finite-field size of at most 2n — 2k. 

Proof: The case of A; = 1 is trivial. By Theorems [2l and [3l we prove the case of A; = 2. 
However, additional effort is needed to prove the case of A; = 3. By Theorems [2] and [3l (n, 3) 
for n > 6 can be proved. But (5, 3) codes are not in the class. In Section IVII-Ai we will address 
this case to complete the proof. ■ 

Remark 5: In order to cover general n, we provide a looser bound on the required finite-field 
size: q >2n — 2k. In fact, for the (5, 3) code (that will be shown in Lemma [2l), a smaller finite- 
field size of g = 3 (< 4 = 2r2 — 2A;) is enough for construction. We have taken the maximum of 
the required field sizes of all the cases. 



A. (5, 3) E-MSR Codes 

We consider = 4 and Ai = 6. The cutset bound & then gives the fundamental limits of: 
storage cost a = 2 and repair-bandwidth-per-link=l; hence, the dimension of encoding matrices 
is 2-by-2. Note that the size is less than the number of systematic nodes. Therefore, our earlier 
framework does not cover this category. In fact, the (5, 3) code is in the case of n + 1 = 2A;, 
where it was shown in flU that there exist codes that achieve the cutset bound under exact repair 
of systematic nodes only (not including parity nodes). 

We propose an eigenvector-based interference alignment technique to prove the code existence 
under exact repair of all nodes. Let a = (01,02)*, b = (&i,&2)* and c = (ci,C2)*. For exact 
repair, we connect to 4(= d) nodes to download a one-dimensional scalar value from each node. 
Fig. [To! illustrates exact repair of node 1. We download four equations from survivor nodes: 
b*v„i; c*v„2; a*(Aiv„3) + b*(Biv„3) + c*(Civ„3); a*(A2V„4) + b*(B2V„4) + c*(C2V„4). The 
approach is different from that of our earlier proposed framework. Instead an idea here consists 

^Independently, the authors in |l5| found (5,3) codes defined over GF(3), based on a search algorithm. 
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Fig. 10. Eigenvector-based interference alignment for (5, 3) E-MSR codes. First we align interference "b" by setting Va3 — 
Bj^^Vq-i and Vd = B^^Vq,i. Next, partially align interference of "c" by setting Vq,2 ~ CiBj'^Vai. Finally, choosing Vai as 
an eigenvector of B2C2"^CiBj^^, we can achieve interference alignment for c. 

of three steps: (1) choosing projection vectors for achieving interference alignment; (2) gathering 
all the alignment constraints and the MDS-code constraint; (3) designing the encoding matrices 
that satisfy all the constraints. Notice the design of encoding matrices is the last part. 

Here are details. Note that there are 6 unknown variables: 2 desired unknowns (ai, 02) and 4 
undesired unknowns (fei, &2, ci, C2). Therefore, it is required to align (61,62,01,02) onto at least 
2-dimensional linear space. We face the challenge that appeared in the (6, 3, 5) code example 
in Fig. m Projection vectors and v^a affect interference alignment b and c simultaneously. 
Therefore, we need simultaneous interference alignment. To solve this problem, we introduce 
an eigenvector-based interference alignment scheme. 

First choose v^s and such that v^s = 'B^^Vai and = B^^v^i, thereby achieving 
interference alignment for "b". Observe the interfering vectors associated with "c": 

Va2', CiB^^VaU C2B2 ^V^l- 

The first and second vectors can be aligned by setting Vq,2 = CiBj^^v^i. Now what about for the 
following two vectors: CiB|f ^Vq,i and C2B2 ^v^i? Suppose that the associated matrices (CiB^^^ 
and C2B2^^) and the projection vector v^i are randomly chosen. Then, the these two vectors 
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are not guaranteed to be aligned. However, a judicious choice of v^i makes it possible to align 
them. The idea is to choose Vq,i as an eigenvector of B2C2 ^CiB^^. Since v„i can be chosen 
arbitrarily, this can be easily done. Lastly consider the condition for ensuring the decodability 
of desired signals: rank ([AiB^^^Vqi A2B2 ^Vq,i]) = 2. 

We repeat the procedure for exact repair of "b" and "c". For parity nodes, we employ the 
remapping technique described earlier: 



a' 
b' 



A* B* C* 
A* B* C* 
1 



a 
b 
c 



A'l A'2 

c'l a, I 



-1 -1 



Ai A2 
Bi B2 
Ci C2 I 



(22) 



We gather all the conditions that need to be guaranteed for exact repair of all nodes: 

rank([AiB^iv,i AsB^^v.i]) = 2, 
rank ([BiCrV;3i BsCs-V^i]) = 2, 

rank([CiArSi C2A2-Xi])=2, (23) 
rank([A;B;-Vi A^B'^'V^a] ) = 2, 
rank([B;crVi B^C^" V^^] ) = 2, 

where 

v^i : an eigenvector of B2C2 ^CiB^^, 
v^i : an eigenvector of C2A2 ^AiCf ^, 

v-yi : an eigenvector of A2B2 ^BiA]^^, (24) 

v^'i : an eigenvector of B2C2~^C'jB']"^, 

vp'i : an eigenvector of C2A'2"^A^C']~^. 
Note that eigenvectors may not exist for the finite Galois field. However, the existence is 
guaranteed by carefully choosing the encoding matrices. We provide an explicit coding scheme 
in the following lemma. 

Lemma 2 ((5,3) E-MSR Codes): Let a,/3 e GF(3) and be non-zero. Suppose encoding ma- 
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bi 




r 1 1 




h2 







- > 










Cl 




r 2 " 




C2 




1 





2Ci + C2 



2ai + 2a2 + 6i + 2ci + C2 
a2 + 2bi + 2^2 + 2c2 



2ai + 02 + foi + Cl + 2c2 
2a2 + 2bi + 62 + 2c2 





2ai + 2a2 
+ 61 + (2ci -i 






2ai + 02 
+ 61 +2(2ci 


f C2) 



Fig. 11. Illustration of exact repair of node 1 for a (5, 3) E-MSR code defined over GF(3). The eigenvector-based interference 
alignment scheme enables to decode 2 desired unknowns (ai,02) from 4 equations containing 6 unknowns. Notice that 
interference "b" and "c" are aligned simultaneously although the same projection vectors Va3 and Vc«4 are used. 



trices are given by 





2a 




a 2a 




2a 
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2(3 (3 
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2/3 


,Ci = 


(3 2(3 





2a 




a 2a 
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A2 = 


(3 2(3 


,B2 = 


(3 


,C2 = 


2/3 2(3 



Then, the code satisfies the MDS property and achieves the MSR point ([T]) under exact repair 
constraints of all nodes. 

Proof: See Appendix O ■ 

Remark 6: Note that encoding matrices are lower-triangular or upper-triangular. This structure 
has important properties. Not only does this structure guarantee invertibility, it can in fact 
guarantee the existence of eigenvectors. It turns out the structure as above satisfies all of the 
conditions needed for the MDS property and exact repair. 

Example 2: Fig. [TTI illustrates exact repair of node 1 (ai, 02) for a (5, 3) E-MSR code defined 
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over GF(3). Notice that interference "b" and "c" are aligned simultaneously. One can check 
exact repair of the remaining four nodes based on our proposed method. 

VIII. Conclusion 

We have systematically developed interference alignment techniques that attain the cutset- 
based MSR point ([U) under exact repair constraints of all nodes. Based on the proposed frame- 
work, we provided a generalized family of codes for the cases: (a) ^ < |; (6) A; < 3, for arbitrary 
n > k. This generalized family of codes provides insights into a dual relationship between the 
systematic and parity node repair, as well as opens up a larger constructive design space of 
solutions. For (5, 3) codes which do not satisfy ^ < |, we have developed an eigenvector-based 
interference alignment to show the optimality of the cutset bound. Unlike wireless communication 
problems, our storage repair problems have more flexibility in designing encoding matrices which 
correspond to wireless channel coefficients (provided by nature) in communication problems. 
Exploiting this fact, we developed interference alignment techniques for optimal exact repair 
codes in distributed storage systems. 

Appendix A 
Proof of Lemma [H 

It suffices to show that 

A[ A2 A3 
B[ B2 B3 
C'l C'2 C3 
Using ^ and (fT3l) . we compute: 

(1 - /«2)(A;Ai + A'aBi + AfjCi) = {V.u'l - K''a[l) (uivj + ail) 

+ (v^u'Z - K^f3[l) (uiv* + /3il) + {V.u'i - kSiI) (uiv* + 71I) 

= (v'lV* + v^v* + v^v*) + (aiv'i + /3iV^ + TlVgX - K^Ui{a[Vi + (3[v2 + T^Vg)* - kH 

- (v'lVi + V2V2 + V3V3) + KUiu'l - K^Ui{a[\i + f3[\2 + 7iv3)* - 

(J (v^vj + v^v* + v^^v* ) - 

(1 - 



Ai 


A2 


A3 




I 








Bi 


B2 


B3 







I 





Ci 


C2 


C3 










I 
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where (a) follows from aia[ + (3i(3[ +717^ = 1 due to (fTTI) : (b) follows from (fT2l) : (c) follows 
from u[ = 2(0; '^vi + I3[v2 + 'j'lVs) (See Claim [T]); and (d) follows from the fact that + 
V2V2 + V3V3 = I, since (v'^^, Vg, Vg) are dual basis vectors. 

Similarly, one can check that B[A2 + B2B2 + B3C2 = I and C'^As + C2B3 + C3C3 = I. 
Now let us compute one of the cross terms: 

(1 - ^)A[A2 + A'aBs + A'3C2 = (v'lu'/ - K^a[l) (uavj + asl) 

+ (V^U'Z - K^(3[I) (U2V* + /32I) + {V.U'I - K^.l) (U2V* + 72I) 

(aav'i + P2^2 + 72V3)u'/ - ^^^U2{a[^r, + (3[y2 + 7^3)* 
'^0 



where (a) follows from ufu^ = 5{i — j) and < (a'^, 7J), (a2,/32,72) >= 0; (b) follows from 
(fT2l) and Claim [T] Similarly, we can check that the other cross terms are zero matrices. This 
completes the proof. 

Claim 1: For all i, u- = ^(a-vi + /3.V2 + 7-V3). 
Proof: By (fT2l) . we can rewrite 

«! ^2 Ct3 

/3i /32 /33 
7i 72 73 

Using the fact that (u'^, U2, Ug) are dual basis vectors, we get 



[ui,U2,U3] = -[vi,V2 



2.V3J 



UV 



Un 



Un 



2 ^2 i2 

3 i. 



This completes the proof. 



Appendix B 
Proof of Theorem [2] 

For generalization, we are forced to use some heavy notation but only for this section and 
the related appendices. Let be a A;-dimensional message vector for information unit j. Let 
w'- be the newly mapped information unit after remapping. Let Gf' be an encoding matrix for 
parity node i, associated with the jth information unit. Let G^''*'' be the newly mapped entity. 
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A. Exact Repair of Systematic Nodes 

For exact repair of systematic node i, we have each survivor node project their data 

Therefore, we can achieve simultaneous interference alignment for non-intended signals, while 
guaranteeing the decodability of desired signals. 



B. Exact Repair of Parity Nodes 

The idea is the same as that of Theorem [TJ The detailed procedures are as follow. First we 
remap parity nodes into new variables: 
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Define the newly remapped encoding matrices as: 
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We can now apply the generalization of Lemma \T\ to obtain the dual structure: 
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where the dual basis vectors are defined as: 
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Let us check exact repair of parity node i. We choose projection vectors as Uj. Then, V/ = 
1, ■ ■ ■ , /c, we get: 

Therefore, we can achieve simultaneous interference alignment for non-intended signals, while 
guaranteeing the decodability of desired signals. 

C. The MDS-Code Property 

We check the invertibility of a composite encoding matrix when a Data Collector connects to 
i systematic nodes and (k — i) parity nodes for i = 0, ■ ■ ■ , A;. The main idea is to use a Gaussian 
elimination method as we did in Section |IV-C[ The verification is tedious and therefore details 
are omitted. 

D. Minimum Required Finite-Field Size 

Note that the dimension of an encoding matrix is k-hy-k. Therefore, the minimum finite-field 
size required to generate a Cauchy matrix is 2k, i.e., q > 2k. 

Appendix C 
Proof of Lemma [2] 

A. Exact Repair 

With the Gaussian elimination method, we get 
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(27) 



Using this, we can easily check the the existence of eigenvectors (|24|) and decodabiity of desired 
signals (|23l) . This completes the proof. 
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B. The MDS-Code Property 

Obviously, all the encoding matrices are invertible due to their lower-triangular or upper- 
triangular structure. We consider three cases where a Data Collector connects to (1) 3 systematic 
nodes; (2) 2 systematic nodes and 1 parity node; and (3) 1 systematic node and 2 parity nodes. 
The first is a trivial case where the composite matrix associated with information units is an 
identity matrix. The second case is also trivial, since each encoding matrix is invertible so that 
the composite matrix is invertible as well. For the last case, we consider 
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Bi B2 
Ci C2 
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(28) 



It is easy to check the invertibility of this matrix via the Gaussian elimination method. The 
invertibility for all the cases guarantees the MDS property. 
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